In UVA’s LinkLab, as a part of the Living Link Lab Program, there is a significant amount of environmental, occupancy data, and HVAC system operational data available for analysis. This presents an opportunity for a detailed case study of the performance of an HVAC system among multiple metrics outside of just temperature and humidity with the intention of improving HVAC control systems’ ability to maintain occupant comfort with reduced energy consumption. Investigation of multiple metrics of occupant comfort, whether a given room even has occupants which require comfort, various metrics of system operation, and energy consumption consumption (actual and calculated) has the potential to produce a environmental model which may aid in the development of an improved policy for the HVAC system control problem.
Considering HVAC usage accounts for 30% of total commercial building energy consumption (US Dept. of Energy Commissioned Report on Energy Savings Potential and RD&D Opportunities for Commercial Building HVAC Systems, 2017), there is significant environmental and economic incentive to reducing the energy load of HVAC systems both for regulators and commercial operators. This same commissioned report cites “Technology Enhancements for Current Systems” as one of four groups of high priority technology options, with “Advanced HVAC Sensors” as the top ranked technology within this category at an estimated Technical Energy Savings Potential (Quadrillion BTU/yr.) of 0.63, lending particular credence to the idea that advanced sensing combined with more efficient control could be a significant contributor to reduced HVAC system burden on energy resources.
This project will investigate the first phase of determining if it possible to create a more efficient control policy for HVAC systems by developing a model of energy consumption calculated from system operation metrics, validating this model with a limited amount of actual collected energy usage data, and attempting to forecast energy demand with this data.
The data which will be analyzed consists of data exported from an internal UVA server which hosts all HVAC system data related to Olsson Hall and the Link Lab. Additionally, local weather data on humidity, atmospheric pressure, and local temperature data from the KVACHARL114 station attached to Olsson hall will be obtained as potential predictors of energy consumption.
library(dplyr)
library(forecast)
package 㤼㸱forecast㤼㸲 was built under R version 4.0.5Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
library(fable)
Loading required package: fabletools
package 㤼㸱fabletools㤼㸲 was built under R version 4.0.4
Attaching package: 㤼㸱fabletools㤼㸲
The following objects are masked from 㤼㸱package:forecast㤼㸲:
accuracy, forecast
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages ---------------------------------------------------------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.3 v purrr 0.3.4
v tibble 3.0.6 v stringr 1.4.0
v tidyr 1.1.2 v forcats 0.5.1
v readr 1.4.0
package 㤼㸱stringr㤼㸲 was built under R version 4.0.5-- Conflicts ------------------------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(lubridate)
package 㤼㸱lubridate㤼㸲 was built under R version 4.0.4
Attaching package: 㤼㸱lubridate㤼㸲
The following objects are masked from 㤼㸱package:base㤼㸲:
date, intersect, setdiff, union
library(fpp3)
-- Attaching packages --------------------------------------------------------------------------------------------- fpp3 0.4.0 --
v tsibble 1.0.0 v feasts 0.2.1
v tsibbledata 0.3.0
package 㤼㸱tsibbledata㤼㸲 was built under R version 4.0.4package 㤼㸱feasts㤼㸲 was built under R version 4.0.4-- Conflicts -------------------------------------------------------------------------------------------------- fpp3_conflicts --
x lubridate::date() masks base::date()
x dplyr::filter() masks stats::filter()
x fabletools::forecast() masks forecast::forecast()
x tsibble::intersect() masks base::intersect()
x tsibble::interval() masks lubridate::interval()
x dplyr::lag() masks stats::lag()
x tsibble::setdiff() masks base::setdiff()
x tsibble::union() masks base::union()
library(ggplot2)
library(plotly)
package 㤼㸱plotly㤼㸲 was built under R version 4.0.4Registered S3 method overwritten by 'data.table':
method from
print.data.table
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Attaching package: 㤼㸱plotly㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
last_plot
The following object is masked from 㤼㸱package:stats㤼㸲:
filter
The following object is masked from 㤼㸱package:graphics㤼㸲:
layout
load(file="C:\\Users\\canea\\Documents\\UVA\\Spring 2021\\Timeseries\\neale-caleb\\data\\cleanedData.Rdata")
Index: Time (30min)
Key: HVAC Air Handling Unit (AHU2E or AHU2W)
Values of Interest: - SAT - Output Air Temperature (deg C) - MAT - Input Air Temperature (deg C) - SAF - Air Supply Flow (CFM) - SAFSP - Air Supply Flow Set point (CFM) - SAEnthalpy - Output air enthalpy (calculated, kW) - MAEnthalpy - Input air enthalpy (calculated, kW) - totalHeat - Change in enthalpy (calculated, kW) - fanEnergy - Energy used by fan (calculated, kW)
head(equip)
Index: Time (30min)
Key: Room in Olsson Hall
Values: - SAF - Supply Air Flow (CFM) - outputTemp - Reheated air temperature from room level treatment (deg C) - ZNT - Room temperature setpoint (deg C) - SAFSP - Supply air flow setpoint (CFM) - equipment - AHU which serves given room (AHU2E or AHU2W) - inputTemp - Air temperature from AHU given to room for reheat (deg C) - inputEnthalpy - Energy of input air (calculated, kW) - outputEnthalpy - Energy of output air (calculated, kW) - totalHeat - Energy used in reheat (calculated, kW)
head(rooms)
Index: Time (5min)
Key: StationID, KVACHARL114
Values: - humidityAvg - Average relative humidity over 5 min interval (%) - tempAvg - Average temperature over 5 min interval (deg C) - pressureMax - Maximum air pressure over 5 min interval (torr need to confirm unit)
head(weather)
Index: Time (30min)
Key: HVAC Air Handling Unit (AHU2E or AHU2W)
Values: - totalHeat.AHU - Energy used in AHU level treatment (calculated, kW) - fanEnergy - Energy used by fan (calculated, kW) - totalHeat.room - Energy used in room level treatment (calculated, kW) - totalEnergy - sum of all three above energy categories (calculated, kW)
head(totalEnergyDF %>% filter(!is.na(totalEnergy)))
autoplot(totalEnergyDF, .vars = totalEnergy) + ylab("Calculated System Energy (kW)")
The HVAC system under consideration is a VAV (variable air volume) system which consists of two air handling units (AHU) and VAV boxes. VAV systems manage the temperature of the different rooms in the building by providing specific volumes of air to each room using equipment known as VAV boxes. At the VAV box, air may be reheated to provide the room with the correct temperature of air needed to maintain the environment at the given setpoint. Our simplified model of the AHU consists of hot and cold water coils for heating/cooling, a return air fan, and a supply air fan. Each VAV box consists of a vent regulating air volume and a heating coil.
Return air is defined as air which is being returned from the conditioned environment. In this system, return air is mixed with outdoor air to produce mixed air. Mixed air serves as the input to the conditioning system at the AHU level. Once the air is treated (heating/cooling, humidity removal) by the AHU, it is output as supply air. This supply air is then provided to each VAV box for additional heating as needed, producing final supply air. As this lack of distinction between these two versions of “supply air” may be confusing, this analysis will adopt the terms “input air” and “output air” when discussing both the VAV and AHU treatment processes.
Air supply flow is measured at the VAV box level and aggregated to determine AHU air volumes.
As treatment can occur at both the AHU and VAV box, enthaply change (change in energy due to heating/cooling and humidity removal) must be calculated at both stages and then aggregated to find total enthalpy change at a given time.
Weather data was collected from Need source for this data using an API and then merged into the data using the times. The location of this weather data is TBD and was used to estimate local temperature, pressure, and relative humidity. Check units
The primary goal of this analysis is to generate an effective model of how HVAC system operation affects energy consumption over time, which can be divided into three parts:
The primary usages of energy in this HVAC system come from treatment at the AHU, treatment at the VAV box, and fan operation in the AHU. Calculations for treatment at the AHU and VAV box level leverage the same equations, and fan equations can be leveraged to estimate fan energy usage.
The change in energy due to air treatment (at either AHU or VAV) can be divided into latent and sensible heat. Changes in latent heat arise from humidity added or removed from the air whereas sensible heat is affected by changes in air temperature. Enthalpy change accounts for both latent and sensible heat as is calculated using fix with Zotero (https://rdrr.io/github/chrras/climateeng/man/). Equation 1, below, can then be used to determine total heat by combining calculated enthalpy change with collected data on air volume flow.
Zotero source h_t = ρ*q*dh
where:
h_t = total heat (kW)
q = air volume flow (m3/s)
ρ = density of air (1.202 kg/m3)
dh = enthalpy difference (kJ/kg)
load("C:\\Users\\canea\\Documents\\UVA\\Spring 2021\\Timeseries\\neale-caleb\\data\\preCalcData.RData")
library(climateeng)
#Define constants
RHconstant = 0.010
density = 1.202
CFMtoM_SConverstionFactor = 0.00047194745
# Calculate AHU level enthalpy
equip$SAEnthalpy = enthalpy(equip$SAT, RHconstant)
equip$MAEnthalpy = enthalpy(equip$MAT, RHconstant)
equip$totalHeat = abs(equip$SAEnthalpy -equip$MAEnthalpy)*CFMtoM_SConverstionFactor*equip$SAF*density
# create table mapping room to equpment from system documentation
AHU_2E <- c(241, 243, 245, 247, 249, 251, 253, 257, 255, 259, 263, 261, 240, "C244", 244, 260, 213, 217, 225, 218, "C230", 220, "C210", "T212", "T218", "C216", "C214", "T210", 256, 229, 231, 223, "C227", "C211", 254, "C250", 258, "C213", 221)
AHU_2W <- c(269, 267, 265, 273, 271, 275, 277, 279, 281, 283, 285, 274, 286, 204, 208, 272, 270, "C260", "C200", "C201", 203, 276, "C280", "C270", 211, 201)
rooms_tbl <- rbind(tibble('room' = AHU_2E, 'equipment' = "AHU2E"), tibble('room' = AHU_2W, 'equipment' = "AHU2W"))
# extract relevant information from room coding
rooms$cleaned_room = str_extract(rooms$room, "C\\d{3}|^\\d{3}")
# get relevant equipment for each room
rooms = left_join(rooms, rooms_tbl, by=c("cleaned_room" = "room"))
rooms %>% filter(!is.na(room)) -> rooms
rooms = rooms %>% select(-cleaned_room) -> rooms
# get input air temperature for each room
rel = equip %>% select(SAT)
rooms = left_join(rooms, rel, by=c("equipment", "time"))
# rename columns for clarity
colnames(rooms)[3] = "outputTemp"
colnames(rooms)[8] = "inputTemp"
rooms$inputEnthalpy = enthalpy(rooms$inputTemp, RHconstant)
rooms$outputEnthalpy = enthalpy(rooms$outputTemp, RHconstant)
rooms$totalHeat = abs(rooms$inputEnthalpy - rooms$outputEnthalpy)*CFMtoM_SConverstionFactor*rooms$SAF*density
Power used by a fan can be calculated using equation 2, below. Pressure set to a constant typical of commercial HVAC fans, fan efficiency will be assumed to be 60%, and air volume is measured.
Zotero source P = dp*q/mu
where:
mu = fan efficiency (values between 0 - 1)
dp = total pressure (Pa)
q = air volume delivered by the fan (m3/s)
P = power used by the fan (W, Nm/s)
# Set constants
fanEfficiencyCoef = 0.6
pressure = 4000
Wto_kW =1000
# Calculate energy (2* for two fans)
equip$fanEnergy = 2*equip$SAF*CFMtoM_SConverstionFactor*pressure/fanEfficiencyCoef/Wto_kW
rooms %>% group_by(equipment) %>% summarise(totalHeat = sum(totalHeat)) -> roomAHUHeat
equip %>% select(totalHeat, fanEnergy) %>% inner_join(roomAHUHeat, by=c("time", "equipment"), suffix = c(".AHU", ".room")) -> totalEnergyDF
totalEnergyDF$totalHeat = totalEnergyDF$totalHeat.AHU + totalEnergyDF$totalHeat.room
totalEnergyDF$totalEnergy = totalEnergyDF$totalHeat + totalEnergyDF$fanEnergy
totalEnergyDF = fill_gaps(totalEnergyDF)
gg_season(totalEnergyDF, totalEnergy, period = "week")
gg_season(totalEnergyDF, totalEnergy, period = "day")
gg_season(totalEnergyDF, totalEnergy, period = "month")
Energy consumption has a clear daily cycle where the system is off over night then turns on at a time early enough in the morning to ensure that the internal temperature in the building meets the setpoint by the time occupants arrive. The shifting start times throughout the year as seen on the daily seasonal plot are a result of this control mechanism which predicts how long the system will need to arrive at the setpoint given the current internal temperature and begins treatment accordingly. During colder/warmer seasons, internal temperature drifts farther from the setpoint overnight and thus takes longer to return to the setpoint.
AHU2W appears to have a lower bound of energy consumption above that of AHU2E. Investigating the components of totalEnergy reveals that this lower bound comes from increased nighttime fan energy and AHU level energy consumption.
gg_season(fill_gaps(equip), fanEnergy, period = "day") + ylab("Fan Energy (kW)")
gg_season(fill_gaps(roomAHUHeat), totalHeat, period = "day") + ylab("Room Level Reheat Energy (kW)")
gg_season(fill_gaps(equip), totalHeat, period = "day") + ylab("AHU Level Treatment Energy (kW)")
Breaking down energy consumption in this manner also reveals that fan energy consumption follows a daily cycle but is not as significantly influenced by time of year as room or AHU treatment consumption. Interestingly as well, though both AHU and room energy consumption show a yearly trend, they appear to be nearly inverse. Room consumption is highest on blue days and lowest on pink, while AHU consumption appears to be highest on orange and lowest on blue.
# get longform data for plotting
totalEnergyDF %>% select(-totalHeat, -totalEnergy) %>% filter(equipment == "AHU2E") %>% pivot_longer(cols = c(totalHeat.AHU, fanEnergy, totalHeat.room), names_to = "datapoint", values_to = "value") -> longAHU2E
totalEnergyDF %>% select(-totalHeat, -totalEnergy) %>% filter(equipment == "AHU2W") %>% pivot_longer(cols = c(totalHeat.AHU, fanEnergy, totalHeat.room), names_to = "datapoint", values_to = "value") -> longAHU2W
# stacked area plots
ggplot(data = longAHU2W, aes(x=time, y=value, fill=datapoint)) + geom_area()
ggplot(data = longAHU2E, aes(x=time, y=value, fill=datapoint)) + geom_area()
# proportional stacked area plots
ggplot(longAHU2W, aes(x=time, y=value, fill=datapoint)) +
geom_area(position = "fill") + scale_fill_brewer(palette = "Blues")
ggplot(longAHU2E, aes(x=time, y=value, fill=datapoint)) +
geom_area(position = "fill") + scale_fill_brewer(palette = "Blues")
It appears that room reheating uses a much larger proportion of the energy of the system in months where heating is more likely to be active than cooling and the proportion of energy expended on the fans increases dramatically as overall system energy use declines. Though we don’t have a complete year of data, it appears shoulder seasons (fall/spring) consume less energy than winter/summer, in line with what we might expect given outdoor temperatures.
weather_merge = weather
# shift weather data by a minute to allow for merging
weather_merge$obsTimeLocal = weather_merge$obsTimeLocal + 60
# merge data
left_join(totalEnergyDF, weather_merge, by=c("time" = "obsTimeLocal")) %>% filter(!is.na(totalEnergy))-> weather_energy
# plot, filtering out low energy values for when HVAC system is "off"
ggplot(data = weather_energy %>% filter(totalEnergy > 50), aes(x=tempAvg, y=totalEnergy)) + geom_point() + geom_smooth()
As the previous charts showing lower consumption in shoulder seasons implied, energy consumption appears to be roughly quadratic with respect to temperature.
Historical data is available for building-wide energy consumption, divided into:
head(buildingEnergy)
As HVAC operation represents a significant part of building energy consumption, we would expect that trends in and changes of HVAC system energy consumption would closely track with building wide energy consumption. Calculated historical energy consumption is plotted next to building-wide energy consumption for inspection:
library(plotly)
equip %>% filter(!is.na(SAT) & !is.na(MAT)) -> equipVerify
# determine cooling status based on comparative temperature of input and output air
equipVerify <- equipVerify %>% mutate(cooling =
if_else(
MAT > SAT, 1, 0
)
)
left_join(equipVerify, buildingEnergy, by="time") -> coolingVerify
coolingVerify <- coolingVerify %>% filter(cooling == 1) %>% as_tsibble(key = equipment, index = time)
# plot and model actual cooling energy consumption as a funciton of calcualted heating consumption
ggplot(data = coolingVerify, aes(x=chilledWater, y=totalHeat)) + geom_point() + geom_smooth(method = "lm")
fig <- plot_ly(coolingVerify, x = ~time, y = ~totalHeat, name = 'AHU Heat', type = 'scatter', mode = 'lines')
fig %>% add_trace(y = ~(chilledWater*1000)/3412.142, name = 'Chilled Water', mode = 'lines')
coolingLM = lm(chilledWater~totalHeat, data = coolingVerify)
summary(coolingLM)
Call:
lm(formula = chilledWater ~ totalHeat, data = coolingVerify)
Residuals:
Min 1Q Median 3Q Max
-842.1 -161.9 -100.5 154.3 890.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 106.01886 3.52299 30.09 <2e-16 ***
totalHeat 4.57092 0.09118 50.13 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 250.1 on 11108 degrees of freedom
(614 observations deleted due to missingness)
Multiple R-squared: 0.1845, Adjusted R-squared: 0.1844
F-statistic: 2513 on 1 and 11108 DF, p-value: < 2.2e-16
A linear relationship between the calculated amount of energy used in cooling and historical chilled water consumption appears to exist, though a linear model between these two quantities shows an R^2 of 0.1844. This suggests that tuning within set parameters used in calculated energy consumption might improve the accuracy of the calculations.
# aggregate room level contributions to energy consumption by AHU
rooms %>% group_by(equipment) %>% summarise(totalHeat = sum(totalHeat)) -> roomAHUHeat
# join room and equipment level data
equip %>% inner_join(roomAHUHeat, by=c("time", "equipment"), suffix = c(".AHU", ".room")) -> totalEnergyDF
# create DF for enery verificaiton and determine heating status based on comparative temperature of input and output air
totalEnergyDF %>% filter(!is.na(SAT) & !is.na(MAT)) -> totalEnergyDFVerify
totalEnergyDFVerify <- totalEnergyDFVerify %>%
mutate(isHeating = if_else(
SAT > MAT, 1, 0
)
)
# join in actual energy consumption data
left_join(totalEnergyDFVerify, buildingEnergy, by="time") -> heatingVerify
heatingVerify %>% filter(!is.na(heatingVerify$isHeating)) -> heatingVerify
# calculate total heating energy based on AHU heating status
heatingVerify <- heatingVerify %>%
mutate(heatEnergyCalc = if_else(
isHeating == 1, totalHeat.AHU + totalHeat.room, totalHeat.room
)
)
# plot and model actual heating energy consumption as a funciton of calcualted heating consumption
ggplot(data = heatingVerify, aes(x=heating, y=heatEnergyCalc)) + geom_point() + geom_smooth(method = "lm")
fig <- plot_ly(heatingVerify, x = ~time, y = ~heatEnergyCalc, name = 'AHU Heat', type = 'scatter', mode = 'lines')
# heating is adjusted from kBTU to kW
fig %>% add_trace(y = ~(heating*1000)/3412.142, name = 'Heating', mode = 'lines')
heatingLM = lm(heating~heatEnergyCalc, data = heatingVerify)
summary(heatingLM)
Call:
lm(formula = heating ~ heatEnergyCalc, data = heatingVerify)
Residuals:
Min 1Q Median 3Q Max
-559.0 -154.0 -133.0 135.4 783.2
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 214.33108 1.82976 117.14 <2e-16 ***
heatEnergyCalc 2.21688 0.04549 48.73 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 242.3 on 25805 degrees of freedom
(1749 observations deleted due to missingness)
Multiple R-squared: 0.08428, Adjusted R-squared: 0.08425
F-statistic: 2375 on 1 and 25805 DF, p-value: < 2.2e-16
Modeling actual heat energy consumption as a linear function of calculated heat consumption results in an R^2 of only 0.084, though the model performs better than the null.
Further investigation is clearly needed to determine whether there are other significant contributors to consumption of heating and cooling energy within the building, or if the model needs more development (tuning of constant parameters used in calclations potentially). As a further point of research, having energy readings directly metered from the HVAC unit would likely provide significantly more accurate data, especially if developing an improved control mechanism is ever attempted.
Given the previously shown daily cycle of the data, shoulder season behaviour, and quadratic relationship with local temperate, regression with ARIMA errors seems a reasonable place to begin to model the data generating process behind total energy use in the system.
An investigation of lags and autocorrelation will provide the basis for the parameter selection of the ARIMA model.
totalEnergyDF %>% select(totalEnergy) %>% filter(equipment == "AHU2W") %>% gg_lag(y=totalEnergy, geom = "point")
Removed 15 rows containing missing values (gg_lag).
Potentially given the lower bound previously seen on AHU2W energy consumption, a stronger relationship appears in the lags of AHU2E than AHU2W.
# ACFs
totalEnergyDF %>% filter(equipment == "AHU2E") %>% ACF(totalEnergy, lag_max = 140) %>% autoplot() + ylab("AHU2E ACF")
totalEnergyDF %>% filter(equipment == "AHU2W") %>% ACF(totalEnergy, lag_max = 140) %>% autoplot() + ylab("AHU2W ACF")
# PACFs
totalEnergyDF %>% filter(equipment == "AHU2E") %>% PACF(totalEnergy, lag_max = 140) %>% autoplot() + ylab("AHU2E PACF")
totalEnergyDF %>% filter(equipment == "AHU2W") %>% PACF(totalEnergy, lag_max = 140) %>% autoplot() + ylab("AHU2W PACF")
Given a 48 period lag corresponds to one day, the behaviour seen in the ACF plots show exactly what we might expect; ACF is highest at multiples of 24 hours from a given point. PACF plots also show highest PACF values at about the 24 hour mark, though additional 24 hour extensions provide significantly less information.
However, given the likely yearly seasonality of the data, an investigation into the differences of the data is needed.
totalEnergyDF %>% filter(equipment == "AHU2E") %>% gg_tsdisplay(difference(totalEnergy), plot_type = "partial", lag_max = 140)
totalEnergyDF %>% filter(equipment == "AHU2W") %>% gg_tsdisplay(difference(totalEnergy), plot_type = "partial", lag_max = 140)
On my machine, the models in the following sections take a while to train (about 20-30 minutes). Saved versions of the models are saved in the “models” folder in this repository and can be loaded using “load(‘models/[modelname].RData’)” for faster computing.
The ACF is suggestive of an MA(52) model while the PACF suggests an AR(54) model, so ARIMA(0, 1, 52) and ARIMA(54, 1, 0) might be a good place to start for initially fit models. However, a more appropriate approach would likely be setting the seasonality of the data to daily (48 periods) and attempting ARIMA from there. This may not accurately address yearlong seasonal trends, but weather data may account for this in a later dynamic regression model.
This model seems to capture the daily cycle of energy consumption quite well, but doesn’t well account for the influence of temperature or the general variation in the data. To determine if another ARIMA model would perform better, I’ll use fable::ARIMA to investigate three additional ARIMA models before attempting dynamic regression with ARIMA errors.
residuals(seasonal) %>%
as_tsibble(key=.model, index=time) %>%
gg_season(y=.resid, period = "week")
residuals(stepwise) %>%
as_tsibble(key=.model, index=time) %>%
gg_season(y=.resid, period = "week")
residuals(search) %>%
as_tsibble(key=.model, index=time) %>%
gg_season(y=.resid, period = "week")
It appears the seasonal model has residuals which are the most normally distributed and which most closely approach white noise. Seasonal plots also show more balance in the seasonally adjusted model compared to the large spikes at the beginning and end of each day in the search and stepwise models. Evaluating forecasts gives a clearer picture.
searchFC %>% autoplot()
stepwiseFC %>% autoplot()
seasonalFC %>% autoplot()
Only the model with a 48 seasonal term captures any of the daily cycle in the data, whereas the other two models seems to take a blind guess at the mean value of the process.
The ARIMA model seems to capture the daily cycle in the data well, but daily and seasonal variation exists and this variation could likely be explained by daily and seasonal changes in outdoor temperatures. To capture this variation, dynamic regression with ARIMA error can be applied.
report(dynRegTrain)
Series: totalEnergy
Model: LM w/ ARIMA(5,0,0)(1,1,0)[48] errors
Coefficients:
ar1 ar2 ar3 ar4 ar5 sar1 tempAvg I(tempAvg^2)
0.9032 -0.1656 0.1168 -0.1735 0.1639 -0.2458 -0.3918 0.0172
s.e. 0.0094 0.0127 0.0127 0.0128 0.0094 0.0096 0.2496 0.0074
sigma^2 estimated as 272.8: log likelihood=-49011.45
AIC=98040.9 AICc=98040.91 BIC=98107.41
This dynamic regression model seems to be the best at capturing additional daily variation which may be attributed to temperature. Looking at the residuals on a weekly basisi over the year, it can be seen that the yearly seasonal component is not being completely accounted for, especially in the changing startup time of the system over the course of the year.
The data generating process for system energy consumption can be modeled as a dynamic regression with ARIMA errors in the form:
y = β_0 + β_1*tempAvg + β_2*tempAvg^2 + ε
where ε is ARIMA (5,0,0,)(1,1,0)[48]
Describe how the formal statistical model captures and aligns with the narrative of the data-generating process. Flag any statistical challenges raised by the data generating process, e.g. selection bias; survivorship bias omitted variables bias etc.